ROCm 및 HIP: 상세한 10장 강의 가이드: 병렬 전환: 순차적 로직을 GPU 스레드로 매핑하기

이 병렬 전환 는 계산 철학의 근본적인 전환을 나타냅니다. 즉, 하나씩 차례로 수행하는 시간적 순서 (한 가지 작업을 다음 작업에 이어 수행하는 방식)에서 공간적 분포 (그리드 전체에서 동시에 모든 작업을 수행하는 방식).

1. 독립성 히우리스틱

이는 GPU 컴퓨팅의 황금 법칙입니다: ‘N개의 요소에 대해 어떤 작업을 독립적으로 적용해야 하는가’라는 문제라면, 반드시 처음 시도해볼 수 있는 매핑입니다. 이 데이터 병렬 접근 방식은 스레드 관리 오버헤드가 대규모 동시 처리 성능에 비해 훨씬 작아지는, GPU 가속의 가장 쉽게 얻을 수 있는 성과입니다.

2. 정밀도와 데이터 부하

HIP 커널은 일반적으로 기본 자료형의 대규모 배열을 처리합니다. 고성능 그래픽스 및 머신러닝에서는 보통 float (단정밀도), 과학적 시뮬레이션에서 매우 높은 수치적 안정성이 필요한 경우는 double (이중 정밀도)를 사용합니다.

3. 반복에서 점유로

CPU 코드에서는 프로세서가 반복문을 통해 데이터를 '방문'합니다. 반면 GPU 논리에서는 데이터가 스레드를 '점유'하게 됩니다. 이제 어떻게 반복할지 를 작성하는 것을 멈추고, 특정 좌표에서 단일 워커가 해야 할 일을를 쓰기 시작합니다.

$$\text{인덱스 } i = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary heuristic for deciding if a problem is suitable for the 'Parallel Pivot'?

The problem requires complex recursion.

The problem involves applying an operation independently to N elements.

The problem must be solved in a strict temporal order.

The problem uses only integer arithmetic.

QUESTION 2

In the context of the Parallel Pivot, what does the term 'Occupation' refer to?

The CPU visiting each index in a for-loop.

How many blocks are currently queued in the GPU.

Data 'occupying' a specific thread at a specific coordinate.

The percentage of memory used by the float arrays.

QUESTION 3

Which data types are most commonly handled by HIP kernels for high numerical stability in science?

bool and char

int and long

float and double

void and pointer

QUESTION 4

When pivoting a loop into a kernel, what replaces the loop counter `i`?

The return value of the function.

A global thread identity calculated from grid/block dimensions.

The hipMalloc address.

The host-side iteration variable.

QUESTION 5

Fill in the blank: To ensure production reliability even in basic kernels, you must ______.

Only use float types.

Add explicit error-checking macros everywhere.

Use a single thread per block.

Avoid all boundary checks.